[DAG] SelectionDAG::canCreateUndefOrPoison - Mark AVGFLOORS and AVGCEILS as safe #148191

aabhinavg1 · 2025-07-11T10:30:50Z

This patch updates SelectionDAG::canCreateUndefOrPoison to indicate that ISD::AVGFLOORS and ISD::AVGCEILS do not introduce poison or undef values.

Alive2 Proofs for AVG Operations

Opcode	Operation Type	Alive2 Proof Link
`AVGFLOORS`	Signed Floor Avg	Alive2 Link
`AVGCEILS`	Signed Ceil Avg	Alive2 Link
`AVGFLOORU`	Unsigned Floor Avg	Alive2 Link
`AVGCEILU`	Unsigned Ceil Avg	Alive2 Link

These patterns are safe due to the use of sext i8 into i32, which ensures no signed overflow occurs. The arithmetic is done in the wider domain before truncating safely back to i8.

Includes test coverage to ensure correctness.

Fixes: #147696

llvmbot · 2025-07-11T10:31:23Z

@llvm/pr-subscribers-llvm-selectiondag

Author: None (aabhinavg1)

Changes

…ILS as safe

This patch updates SelectionDAG::canCreateUndefOrPoison to indicate that ISD::AVGFLOORS and ISD::AVGCEILS do not introduce poison or undef values.

This is formally verified using Alive2:

AVGFLOORS: https://alive2.llvm.org/ce/z/JWZcNr
AVGCEILS: https://alive2.llvm.org/ce/z/cW3jrR

These patterns are safe due to the use of sext i8 into i32, which ensures no signed overflow occurs. The arithmetic is done in the wider domain before truncating safely back to i8.

Includes test coverage to ensure correctness.

Differential Revision: <fill this after uploading to Phabricator or GH>

Full diff: https://github.com/llvm/llvm-project/pull/148191.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+4)
(added) llvm/test/CodeGen/AArch64/avgfloor-u8.ll (+16)

diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index c1356239ad206..78f00809e3862 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -5542,6 +5542,10 @@ bool SelectionDAG::canCreateUndefOrPoison(SDValue Op, const APInt &DemandedElts,
   case ISD::UADDSAT:
   case ISD::SSUBSAT:
   case ISD::USUBSAT:
+  case ISD::AVGFLOORS:
+  case ISD::AVGFLOORU:
+  case ISD::AVGCEILS:
+  case ISD::AVGCEILU:
   case ISD::MULHU:
   case ISD::MULHS:
   case ISD::SMIN:
diff --git a/llvm/test/CodeGen/AArch64/avgfloor-u8.ll b/llvm/test/CodeGen/AArch64/avgfloor-u8.ll
new file mode 100644
index 0000000000000..669f6bbad14a3
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/avgfloor-u8.ll
@@ -0,0 +1,16 @@
+; RUN: llc < %s -march=arm64 -mcpu=apple-m1 | FileCheck %s
+
+; CHECK-LABEL: avg:
+; CHECK:       add
+; CHECK:       lsr
+; CHECK:       ret
+
+define zeroext i8 @avg(i8 noundef zeroext %a, i8 noundef zeroext %b) {
+entry:
+  %conv = zext i8 %a to i16
+  %conv1 = zext i8 %b to i16
+  %add = add nuw nsw i16 %conv1, %conv
+  %div3 = lshr i16 %add, 1
+  %conv2 = trunc nuw i16 %div3 to i8
+  ret i8 %conv2
+}

llvmbot · 2025-07-11T10:31:23Z

@llvm/pr-subscribers-backend-aarch64

Author: None (aabhinavg1)

Changes

…ILS as safe

This patch updates SelectionDAG::canCreateUndefOrPoison to indicate that ISD::AVGFLOORS and ISD::AVGCEILS do not introduce poison or undef values.

This is formally verified using Alive2:

AVGFLOORS: https://alive2.llvm.org/ce/z/JWZcNr
AVGCEILS: https://alive2.llvm.org/ce/z/cW3jrR

These patterns are safe due to the use of sext i8 into i32, which ensures no signed overflow occurs. The arithmetic is done in the wider domain before truncating safely back to i8.

Includes test coverage to ensure correctness.

Differential Revision: <fill this after uploading to Phabricator or GH>

Full diff: https://github.com/llvm/llvm-project/pull/148191.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+4)
(added) llvm/test/CodeGen/AArch64/avgfloor-u8.ll (+16)

diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index c1356239ad206..78f00809e3862 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -5542,6 +5542,10 @@ bool SelectionDAG::canCreateUndefOrPoison(SDValue Op, const APInt &DemandedElts,
   case ISD::UADDSAT:
   case ISD::SSUBSAT:
   case ISD::USUBSAT:
+  case ISD::AVGFLOORS:
+  case ISD::AVGFLOORU:
+  case ISD::AVGCEILS:
+  case ISD::AVGCEILU:
   case ISD::MULHU:
   case ISD::MULHS:
   case ISD::SMIN:
diff --git a/llvm/test/CodeGen/AArch64/avgfloor-u8.ll b/llvm/test/CodeGen/AArch64/avgfloor-u8.ll
new file mode 100644
index 0000000000000..669f6bbad14a3
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/avgfloor-u8.ll
@@ -0,0 +1,16 @@
+; RUN: llc < %s -march=arm64 -mcpu=apple-m1 | FileCheck %s
+
+; CHECK-LABEL: avg:
+; CHECK:       add
+; CHECK:       lsr
+; CHECK:       ret
+
+define zeroext i8 @avg(i8 noundef zeroext %a, i8 noundef zeroext %b) {
+entry:
+  %conv = zext i8 %a to i16
+  %conv1 = zext i8 %b to i16
+  %add = add nuw nsw i16 %conv1, %conv
+  %div3 = lshr i16 %add, 1
+  %conv2 = trunc nuw i16 %div3 to i8
+  ret i8 %conv2
+}

jayfoad · 2025-07-11T11:02:22Z

Code change looks fine but I don't understand what the test you added actually shows.

Differential Revision:

Don't add this! We haven't used this since we moved away from Phabricator for code reviews.

jayfoad · 2025-07-11T11:05:45Z

Also, I don't understand how your alive2 links are relevant. They verify an expansion of avgfloors/avgceils, but what does this have to do with introducing poison?

aabhinavg1 · 2025-07-11T11:12:49Z

Hi @jayfoad
Thanks for pointing that out,

I included the Alive2 links to show that the lowering of AVGFLOORS and AVGCEILS avoids poison, but I get that it doesn't directly prove that the SelectionDAG op itself is safe.

If you don’t mind, could you suggest what kind of test cases would better show that these ops don’t introduce poison or undef?

RKSimon · 2025-07-11T11:23:40Z

You need to remove the noundef and compare the effect of freezing the result vs the inputs:

----------------------------------------
define i8 @src(i8 %a, i8 %b) {
#0:
  %a32 = sext i8 %a to i32
  %b32 = sext i8 %b to i32
  %and = and i32 %a32, %b32
  %xor = xor i32 %a32, %b32
  %shr = ashr i32 %xor, 1
  %res = add i32 %and, %shr
  %trunc = trunc i32 %res to i8
  %freeze = freeze i8 %trunc
  ret i8 %freeze
}
=>
define i8 @tgt(i8 %a, i8 %b) {
#0:
  %fa = freeze i8 %a
  %fb = freeze i8 %b
  %a32 = sext i8 %fa to i32
  %b32 = sext i8 %fb to i32
  %and = and i32 %a32, %b32
  %xor = xor i32 %a32, %b32
  %shr = ashr i32 %xor, 1
  %res = add i32 %and, %shr
  %trunc = trunc i32 %res to i8
  ret i8 %trunc
}
Transformation seems to be correct!

(if you remove either of the dst freezes it should fail)

RKSimon · 2025-07-11T11:25:19Z

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

+  case ISD::AVGFLOORS:
+  case ISD::AVGFLOORU:
+  case ISD::AVGCEILS:
+  case ISD::AVGCEILU:


you need test coverage for all 4 patterns

HI @RKSimon

Thanks for the feedback!

Alive2 Proofs for AVG Operations

Opcode Operation Type Alive2 Proof Link

AVGFLOORS Signed Floor Avg Alive2 Link

AVGCEILS Signed Ceil Avg Alive2 Link

AVGFLOORU Unsigned Floor Avg Alive2 Link

AVGCEILU Unsigned Ceil Avg Alive2 Link

Let me know if need to add anything else.

These proofs are weird: the whole point of the and/xor/ashr/add implementation is that you don't need to extend to a wider type, you can do all operations in i8.

davemgreen · 2025-07-11T11:28:21Z

Could you add a tests to llvm/test/CodeGen/AArch64/hadd-combine.ll that use the aarch64.neon.uhadd/urhadd intrinsics? They should be converted to these avg nodes so can be tested directly.

aabhinavg1 · 2025-07-11T12:07:20Z

Could you add a tests to llvm/test/CodeGen/AArch64/hadd-combine.ll that use the aarch64.neon.uhadd/urhadd intrinsics? They should be converted to these avg nodes so can be tested directly.

Hi @davemgreen,

Thanks for the suggestion!

I've drafted the following test cases in llvm/test/CodeGen/AArch64/hadd-combine.ll to exercise the aarch64.neon.uhadd and aarch64.neon.urhadd intrinsics:

; RUN: llc < %s -mtriple=aarch64-none-linux-gnu -mattr=+neon | FileCheck %s

declare <8 x i8> @llvm.aarch64.neon.uhadd.v8i8(<8 x i8>, <8 x i8>)
declare <8 x i8> @llvm.aarch64.neon.urhadd.v8i8(<8 x i8>, <8 x i8>)
declare <16 x i8> @llvm.aarch64.neon.uhadd.v16i8(<16 x i8>, <16 x i8>)
declare <16 x i8> @llvm.aarch64.neon.urhadd.v16i8(<16 x i8>, <16 x i8>)

; CHECK-LABEL: test_uhadd_v8i8:
; CHECK: uhadd v0.8b, v0.8b, v1.8b
define <8 x i8> @test_uhadd_v8i8(<8 x i8> %a, <8 x i8> %b) {
  %res = call <8 x i8> @llvm.aarch64.neon.uhadd.v8i8(<8 x i8> %a, <8 x i8> %b)
  ret <8 x i8> %res
}

; CHECK-LABEL: test_urhadd_v8i8:
; CHECK: urhadd v0.8b, v0.8b, v1.8b
define <8 x i8> @test_urhadd_v8i8(<8 x i8> %a, <8 x i8> %b) {
  %res = call <8 x i8> @llvm.aarch64.neon.urhadd.v8i8(<8 x i8> %a, <8 x i8> %b)
  ret <8 x i8> %res
}

; CHECK-LABEL: test_uhadd_v16i8:
; CHECK: uhadd v0.16b, v0.16b, v1.16b
define <16 x i8> @test_uhadd_v16i8(<16 x i8> %a, <16 x i8> %b) {
  %res = call <16 x i8> @llvm.aarch64.neon.uhadd.v16i8(<16 x i8> %a, <16 x i8> %b)
  ret <16 x i8> %res
}

; CHECK-LABEL: test_urhadd_v16i8:
; CHECK: urhadd v0.16b, v0.16b, v1.16b
define <16 x i8> @test_urhadd_v16i8(<16 x i8> %a, <16 x i8> %b) {
  %res = call <16 x i8> @llvm.aarch64.neon.urhadd.v16i8(<16 x i8> %a, <16 x i8> %b)
  ret <16 x i8> %res
}

Would this be acceptable for verifying that the uhadd/urhadd intrinsics get correctly lowered into the corresponding DAG avg patterns?

RKSimon · 2025-07-11T13:19:39Z

What could be done is to fold based on the knownbits/signbits result of a avgu/avgs node - if the freeze is moved then the knownbits/signbits calculation is exposed and can allow the fold to occur.

aabhinavg1 · 2025-07-11T14:54:54Z

Hi @RKSimon

here is the test could you please look in to it if i have not missed anything

; RUN: opt -passes=instcombine -S < %s | FileCheck %s

; === Signed Floor Average (AVGFLOORS) ===

define i8 @avg_signed_floor_freeze_before(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_signed_floor_freeze_before(
; CHECK: freeze i8
entry:
  %fa = freeze i8 %a
  %fb = freeze i8 %b
  %a32 = sext i8 %fa to i32
  %b32 = sext i8 %fb to i32
  %and = and i32 %a32, %b32
  %xor = xor i32 %a32, %b32
  %shr = ashr i32 %xor, 1
  %sum = add i32 %and, %shr
  %res = trunc i32 %sum to i8
  ret i8 %res
}

define i8 @avg_signed_floor_freeze_after(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_signed_floor_freeze_after(
; CHECK: freeze i8
entry:
  %a32 = sext i8 %a to i32
  %b32 = sext i8 %b to i32
  %and = and i32 %a32, %b32
  %xor = xor i32 %a32, %b32
  %shr = ashr i32 %xor, 1
  %sum = add i32 %and, %shr
  %res = trunc i32 %sum to i8
  %fr = freeze i8 %res
  ret i8 %fr
}

; === Signed Ceil Average (AVGCEILS) ===

define i8 @avg_signed_ceil_freeze_before(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_signed_ceil_freeze_before(
; CHECK: freeze i8
entry:
  %fa = freeze i8 %a
  %fb = freeze i8 %b
  %a32 = sext i8 %fa to i32
  %b32 = sext i8 %fb to i32
  %add = add i32 %a32, %b32
  %add1 = add i32 %add, 1
  %shr = ashr i32 %add1, 1
  %res = trunc i32 %shr to i8
  ret i8 %res
}

define i8 @avg_signed_ceil_freeze_after(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_signed_ceil_freeze_after(
; CHECK: freeze i8
entry:
  %a32 = sext i8 %a to i32
  %b32 = sext i8 %b to i32
  %add = add i32 %a32, %b32
  %add1 = add i32 %add, 1
  %shr = ashr i32 %add1, 1
  %res = trunc i32 %shr to i8
  %fr = freeze i8 %res
  ret i8 %fr
}

; === Unsigned Floor Average (AVGFLOORU) ===

define i8 @avg_unsigned_floor_freeze_before(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_unsigned_floor_freeze_before(
; CHECK: freeze i8
entry:
  %fa = freeze i8 %a
  %fb = freeze i8 %b
  %a32 = zext i8 %fa to i32
  %b32 = zext i8 %fb to i32
  %and = and i32 %a32, %b32
  %xor = xor i32 %a32, %b32
  %shr = lshr i32 %xor, 1
  %sum = add i32 %and, %shr
  %res = trunc i32 %sum to i8
  ret i8 %res
}

define i8 @avg_unsigned_floor_freeze_after(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_unsigned_floor_freeze_after(
; CHECK: freeze i8
entry:
  %a32 = zext i8 %a to i32
  %b32 = zext i8 %b to i32
  %and = and i32 %a32, %b32
  %xor = xor i32 %a32, %b32
  %shr = lshr i32 %xor, 1
  %sum = add i32 %and, %shr
  %res = trunc i32 %sum to i8
  %fr = freeze i8 %res
  ret i8 %fr
}

; === Unsigned Ceil Average (AVGCEILU) ===

define i8 @avg_unsigned_ceil_freeze_before(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_unsigned_ceil_freeze_before(
; CHECK: freeze i8
entry:
  %fa = freeze i8 %a
  %fb = freeze i8 %b
  %a32 = zext i8 %fa to i32
  %b32 = zext i8 %fb to i32
  %add = add i32 %a32, %b32
  %add1 = add i32 %add, 1
  %shr = lshr i32 %add1, 1
  %res = trunc i32 %shr to i8
  ret i8 %res
}

define i8 @avg_unsigned_ceil_freeze_after(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_unsigned_ceil_freeze_after(
; CHECK: freeze i8
entry:
  %a32 = zext i8 %a to i32
  %b32 = zext i8 %b to i32
  %add = add i32 %a32, %b32
  %add1 = add i32 %add, 1
  %shr = lshr i32 %add1, 1
  %res = trunc i32 %shr to i8
  %fr = freeze i8 %res
  ret i8 %fr
}

I’ve added avg-poison-safety.ll with test coverage for all four ops:

AVGFLOORS (signed floor)
AVGCEILS (signed ceil)
AVGFLOORU (unsigned floor)
AVGCEILU (unsigned ceil)

Could you please review if I’ve missed anything?

RKSimon · 2025-07-14T08:17:24Z

llvm/test/CodeGen/AArch64/avgfloor-u8.ll

+  %div3 = lshr i16 %add, 1
+  %conv2 = trunc nuw i16 %div3 to i8
+  ret i8 %conv2
+}


The problem you have if you attempt to freeze the expanded avg pattern - is that the freeze might get folded through instructions BEFORE the avg node is created. You're going to have to use the aarch64 hadd/rhadd intrinsics and rely on a fold that uses knownbits/signbits that is blocked if the freeze it still present and(avg(and(x,const0),and(y,const1)),const2) might work for avgceilu/avgflooru for instance.

llvm\test\CodeGen\X86\freeze-binary.ll has some examples of frozen calls of other intrinsics to give you some ideas that you could try for @llvm.aarch64.neon.shadd.v8i16 etc.

Hi @RKSimon
I've now added a separate AArch64 test (avg-freeze.ll) that demonstrates freeze applied after the hadd/rhadd intrinsics (shadd, srhadd, uhadd, urhadd). This avoids blocking instruction selection and ensures correctness. The tests include:

freeze_shadd_vec

freeze_srhadd_vec

freeze_uhadd_vec

freeze_urhadd_vec

The logic is similar to how freeze-preserving intrinsic selection is handled in freeze-binary.ll for X86.

testcase

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py ; RUN: llc < %s -mtriple=aarch64-- | FileCheck %s --check-prefixes=AARCH64 ; This test verifies that applying `freeze` to the result of AArch64 hadd/rhadd intrinsics ; does not interfere with instruction selection. ; The intrinsic is computed first, then frozen. Since `freeze` is applied after, ; it does not block the selection of `shadd`, `srhadd`, `uhadd`, or `urhadd`. declare <8 x i16> @llvm.aarch64.neon.shadd.v8i16(<8 x i16>, <8 x i16>) declare <8 x i16> @llvm.aarch64.neon.srhadd.v8i16(<8 x i16>, <8 x i16>) declare <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16>, <8 x i16>) declare <8 x i16> @llvm.aarch64.neon.urhadd.v8i16(<8 x i16>, <8 x i16>) define <8 x i16> @freeze_shadd_vec(<8 x i16> %a, <8 x i16> %b) { ; AARCH64-LABEL: freeze_shadd_vec: ; AARCH64: shadd v0.8h, v0.8h, v1.8h ; AARCH64-NEXT: ret %avg = call <8 x i16> @llvm.aarch64.neon.shadd.v8i16(<8 x i16> %a, <8 x i16> %b) %frozen = freeze <8 x i16> %avg ret <8 x i16> %frozen } define <8 x i16> @freeze_srhadd_vec(<8 x i16> %a, <8 x i16> %b) { ; AARCH64-LABEL: freeze_srhadd_vec: ; AARCH64: srhadd v0.8h, v0.8h, v1.8h ; AARCH64-NEXT: ret %avg = call <8 x i16> @llvm.aarch64.neon.srhadd.v8i16(<8 x i16> %a, <8 x i16> %b) %frozen = freeze <8 x i16> %avg ret <8 x i16> %frozen } define <8 x i16> @freeze_uhadd_vec(<8 x i16> %a, <8 x i16> %b) { ; AARCH64-LABEL: freeze_uhadd_vec: ; AARCH64: uhadd v0.8h, v0.8h, v1.8h ; AARCH64-NEXT: ret %avg = call <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16> %a, <8 x i16> %b) %frozen = freeze <8 x i16> %avg ret <8 x i16> %frozen } define <8 x i16> @freeze_urhadd_vec(<8 x i16> %a, <8 x i16> %b) { ; AARCH64-LABEL: freeze_urhadd_vec: ; AARCH64: urhadd v0.8h, v0.8h, v1.8h ; AARCH64-NEXT: ret %avg = call <8 x i16> @llvm.aarch64.neon.urhadd.v8i16(<8 x i16> %a, <8 x i16> %b) %frozen = freeze <8 x i16> %avg ret <8 x i16> %frozen }

llvm-lit report

./llvm-lit ../../llvm/test/CodeGen/AArch64/avg-freeze.ll -- Testing: 1 tests, 1 workers -- PASS: LLVM :: CodeGen/AArch64/avg-freeze.ll (1 of 1) Testing Time: 0.16s Total Discovered Tests: 1 Passed: 1 (100.00%)

Let me know if you'd like to see this added to a different file or want to keep a different name for the file coverage!
or i need to modify anything in the testcase

What I had in mind was something like this:

define <8 x i16> @src(<8 x i16> %a0, <8 x i16> %a1) { %m0 = and <8 x i16> %a0, splat (i16 15) %m1 = and <8 x i16> %a1, splat (i16 15) %avg = call <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16> %m0, <8 x i16> %m1) %frozen = freeze <8 x i16> %avg %mask = and <8 x i16> %frozen, splat (i16 31) ret <8 x i16> %mask } define <8 x i16> @tgt(<8 x i16> %a0, <8 x i16> %a1) { %f0 = freeze <8 x i16> %a0 %f1 = freeze <8 x i16> %a1 %m0 = and <8 x i16> %f0, splat (i16 15) %m1 = and <8 x i16> %f1, splat (i16 15) %avg = call <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16> %m0, <8 x i16> %m1) %mask = and <8 x i16> %avg, splat (i16 31) ret <8 x i16> %mask }

But this doesn't work yet due to a missing support for AArch64ISD::MOVIshift in AArch64TargetLowering::computeKnownBitsForTargetNode

You're right, the src/tgt version highlights a more interesting optimization around freeze placement and known bits. As you mentioned, it doesn't work yet because support for AArch64ISD::MOVIshift is missing in computeKnownBitsForTargetNode .

What you said makes sense we can look into adding that support in a follow-up patch. Once it's there, we can add the more advanced test to properly check how known bits interact with freeze.

For this patch, I suggest we go ahead with the current tests. They still serve the main purpose, which is to make sure that freeze after hadd/rhadd doesn't interfere with instruction selection.

Let me know if you're good with that. If yes, we can land this one and open a new patch for the MOVIshift support and the src/tgt test.

Once #148634 lands, you should be able to quickly remove avgfloor-u8.ll and add suitable test coverage to hadd-combine.ll similar to @src (call them something like @haddu_freeze/@rhaddu_freeze etc.).

Hi @RKSimon Is this test structure OK?
patterns into this unified hadd-freeze.ll after #148634 landed.

; RUN: llc < %s -march=arm64 -mattr=+neon | FileCheck %s declare <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16>, <8 x i16>) declare <8 x i16> @llvm.aarch64.neon.urhadd.v8i16(<8 x i16>, <8 x i16>) declare <8 x i16> @llvm.aarch64.neon.shadd.v8i16(<8 x i16>, <8 x i16>) declare <8 x i16> @llvm.aarch64.neon.srhadd.v8i16(<8 x i16>, <8 x i16>) ;===---------------------------------------------------------------------===; ; Test: freeze does not block uhadd instruction selection ;===---------------------------------------------------------------------===; define <8 x i16> @uhadd_freeze(<8 x i16> %a0, <8 x i16> %a1) { ; CHECK-LABEL: uhadd_freeze: ; CHECK: uhadd ; CHECK: and %m0 = and <8 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15> %m1 = and <8 x i16> %a1, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15> %avg = call <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16> %m0, <8 x i16> %m1) %frozen = freeze <8 x i16> %avg %masked = and <8 x i16> %frozen, <i16 31, i16 31, i16 31, i16 31, i16 31, i16 31, i16 31, i16 31> ret <8 x i16> %masked } ;===---------------------------------------------------------------------===; ; Test: freeze does not block urhadd instruction selection ;===---------------------------------------------------------------------===; define <8 x i16> @urhadd_freeze(<8 x i16> %a0, <8 x i16> %a1) { ; CHECK-LABEL: urhadd_freeze: ; CHECK: urhadd ; CHECK: and %m0 = and <8 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15> %m1 = and <8 x i16> %a1, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15> %avg = call <8 x i16> @llvm.aarch64.neon.urhadd.v8i16(<8 x i16> %m0, <8 x i16> %m1) %frozen = freeze <8 x i16> %avg %masked = and <8 x i16> %frozen, <i16 31, i16 31, i16 31, i16 31, i16 31, i16 31, i16 31, i16 31> ret <8 x i16> %masked } ;===---------------------------------------------------------------------===; ; Test: freeze does not block shadd instruction selection ;===---------------------------------------------------------------------===; define <8 x i16> @shadd_freeze(<8 x i16> %a0, <8 x i16> %a1) { ; CHECK-LABEL: shadd_freeze: ; CHECK: shadd ; CHECK: and %m0 = and <8 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15> %m1 = and <8 x i16> %a1, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15> %avg = call <8 x i16> @llvm.aarch64.neon.shadd.v8i16(<8 x i16> %m0, <8 x i16> %m1) %frozen = freeze <8 x i16> %avg %masked = and <8 x i16> %frozen, <i16 31, i16 31, i16 31, i16 31, i16 31, i16 31, i16 31, i16 31> ret <8 x i16> %masked } ;===---------------------------------------------------------------------===; ; Test: freeze does not block srhadd instruction selection ;===---------------------------------------------------------------------===; define <8 x i16> @srhadd_freeze(<8 x i16> %a0, <8 x i16> %a1) { ; CHECK-LABEL: srhadd_freeze: ; CHECK: srhadd ; CHECK: and %m0 = and <8 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15> %m1 = and <8 x i16> %a1, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15> %avg = call <8 x i16> @llvm.aarch64.neon.srhadd.v8i16(<8 x i16> %m0, <8 x i16> %m1) %frozen = freeze <8 x i16> %avg %masked = and <8 x i16> %frozen, <i16 31, i16 31, i16 31, i16 31, i16 31, i16 31, i16 31, i16 31> ret <8 x i16> %masked }

Yes, a hadd-freeze.ll file should be fine - please get the patch updated to allow us to review the diffs. I'm expecting you will need to come up with a different approach for signed hadd tests as otherwise they will just fold to unsigned variants if they are known poisitive.

llvm/test/CodeGen/AArch64/neon-hadd-freeze.ll

davemgreen · 2025-07-17T10:20:50Z

llvm/test/CodeGen/AArch64/neon-hadd-freeze.ll

+declare <8 x i16> @llvm.aarch64.neon.shadd.v8i16(<8 x i16>, <8 x i16>)
+declare <8 x i16> @llvm.aarch64.neon.srhadd.v8i16(<8 x i16>, <8 x i16>)
+
+;===---------------------------------------------------------------------===;


Remove these ;===-----'s. They don't add much.

…ILS as safe This patch updates `SelectionDAG::canCreateUndefOrPoison` to indicate that `ISD::AVGFLOORS` and `ISD::AVGCEILS` do not introduce poison or undef values. | Opcode | Operation Type | Alive2 Proof Link | |--------------|-------------------|--------------------| | `AVGFLOORS` | Signed Floor Avg | [Alive2 Link](https://alive2.llvm.org/ce/z/Dwy8a5) | | `AVGCEILS` | Signed Ceil Avg | [Alive2 Link](https://alive2.llvm.org/ce/z/_JKF8A) | | `AVGFLOORU` | Unsigned Floor Avg| [Alive2 Link](https://alive2.llvm.org/ce/z/2-B6RM) | | `AVGCEILU` | Unsigned Ceil Avg | [Alive2 Link](https://alive2.llvm.org/ce/z/t5WZ6K) | These patterns are safe due to the use of `sext i8` into `i32`, which ensures no signed overflow occurs. The arithmetic is done in the wider domain before truncating safely back to `i8`. Includes test coverage to ensure correctness.

RKSimon · 2025-07-17T11:02:41Z

llvm/test/CodeGen/AArch64/neon-hadd-freeze.ll

+; CHECK-NEXT:    and v1.16b, v1.16b, v2.16b
+; CHECK-NEXT:    movi v2.8h, #31
+; CHECK-NEXT:    uhadd v0.8h, v0.8h, v1.8h
+; CHECK-NEXT:    and v0.16b, v0.16b, v2.16b


This and should have gone - any idea why it hasn't?

RKSimon · 2025-07-17T11:02:48Z

llvm/test/CodeGen/AArch64/neon-hadd-freeze.ll

+; CHECK-NEXT:    and v1.16b, v1.16b, v2.16b
+; CHECK-NEXT:    movi v2.8h, #31
+; CHECK-NEXT:    urhadd v0.8h, v0.8h, v1.8h
+; CHECK-NEXT:    and v0.16b, v0.16b, v2.16b


This and should have gone - any idea why it hasn't?

LLVM must be conservative around freeze to avoid miscompilation. Instruction selection may proceed, but value range assumptions cannot be made post-freeze.

#58321

Does it help if you apply #149323?

Thanks, I tested with PR [#149323]
After rebuilding and rerunning llc on the neon-hadd-freeze.ll test, the instruction:

and v1.16b, v1.16b, v2.16b

is still present in the output for both uhadd_freeze and urhadd_freeze.

So it seems the patch does not remove the redundant and as expected.

…des can't create undef/poison Possible fix for failed fold in llvm#148191

llvmbot added backend:AArch64 llvm:SelectionDAG SelectionDAGISel as well labels Jul 11, 2025

RKSimon mentioned this pull request Jul 11, 2025

[DAG] SelectionDAG::canCreateUndefOrPoison - add ISD::AVGFLOORS/AVGFLOORU/AVGCEILS/AVGCEILU handling + tests #147696

Open

2 tasks

aabhinavg1 requested review from RKSimon and EugeneZelenko July 11, 2025 10:37

RKSimon changed the title ~~[DAG] SelectionDAG::canCreateUndefOrPoison - Mark AVGFLOORS and AVGCE…~~ [DAG] SelectionDAG::canCreateUndefOrPoison - Mark AVGFLOORS and AVGCEILS as safe Jul 11, 2025

RKSimon reviewed Jul 11, 2025

View reviewed changes

aabhinavg1 self-assigned this Jul 11, 2025

EugeneZelenko removed their request for review July 11, 2025 13:24

RKSimon reviewed Jul 14, 2025

View reviewed changes

This was referenced Jul 14, 2025

[AArch64] AArch64TargetLowering::computeKnownBitsForTargetNode - add support for AArch64ISD::MOVIshift #148596

Closed

Adding abdu/abds to canCreateUndefOrPoison #149017

Open

aabhinavg1 force-pushed the D147696_SelectionDAG branch from 3e3632e to 6999bf3 Compare July 17, 2025 09:44

davemgreen reviewed Jul 17, 2025

View reviewed changes

aabhinavg1 force-pushed the D147696_SelectionDAG branch from 6999bf3 to ae97069 Compare July 17, 2025 10:48

RKSimon reviewed Jul 17, 2025

View reviewed changes

RKSimon added a commit to RKSimon/llvm-project that referenced this pull request Jul 17, 2025

[AArch64] canCreateUndefOrPoisonForTargetNode - AArch64ISD::MOVI opco…

8c0a882

…des can't create undef/poison Possible fix for failed fold in llvm#148191

RKSimon mentioned this pull request Jul 17, 2025

[AArch64] canCreateUndefOrPoisonForTargetNode - AArch64ISD::MOVI opcodes can't create undef/poison #149323

Draft

[DAG] SelectionDAG::canCreateUndefOrPoison - Mark AVGFLOORS and AVGCEILS as safe #148191

Are you sure you want to change the base?

[DAG] SelectionDAG::canCreateUndefOrPoison - Mark AVGFLOORS and AVGCEILS as safe #148191

Conversation

aabhinavg1 commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Alive2 Proofs for AVG Operations

Uh oh!

llvmbot commented Jul 11, 2025

Uh oh!

llvmbot commented Jul 11, 2025

Uh oh!

jayfoad commented Jul 11, 2025

Uh oh!

jayfoad commented Jul 11, 2025

Uh oh!

aabhinavg1 commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RKSimon commented Jul 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aabhinavg1 Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Alive2 Proofs for AVG Operations

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davemgreen commented Jul 11, 2025

Uh oh!

aabhinavg1 commented Jul 11, 2025

Uh oh!

RKSimon commented Jul 11, 2025

Uh oh!

aabhinavg1 commented Jul 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aabhinavg1 Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aabhinavg1 commented Jul 11, 2025 •

edited

Loading

aabhinavg1 commented Jul 11, 2025 •

edited

Loading

aabhinavg1 Jul 11, 2025 •

edited

Loading

aabhinavg1 Jul 17, 2025 •

edited

Loading